Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs
نویسندگان
چکیده
BACKGROUND The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of 'transcription noise'. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner. PRINCIPAL FINDINGS We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented. CONCLUSION Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms.
منابع مشابه
Long non-coding RNAs and their significance in human diseases
Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...
متن کاملSNHG6 203 Transcript Could be Applied as an Auxiliary Factor for more Precise Staging of Breast Cancer
Background: Nowadays long non-coding RNAs are known as interesting functional part of the transcriptome. LncRNA SNHG6 was reported to be expressed more in breast cancer tissues than non-tumor ones. As a frequent cancer among women, breast cancer treatment needs applied biomarkers for fast prognosis and diagnosis. SNHG6 RNA and its splice variants could be considered as molecula...
متن کاملIdentification and Functional Prediction of Long Non-Coding RNAs Responsive to Drought stress in Lens culinaris L.
Drought stress is one of the main environmental factors that affects growth and productivity of crop plants, including lentil. In the course of evolution evolution, crucial genetic regulations mediated by non-coding RNAs (ncRNAs) have emerged in plant in response to drought and other abiotic stresses. In the present study, after identifying lncRNAs within the expression profile of lentil, RNA-s...
متن کاملP87: The Role of the Long Non-Coding RNA Sequences (LncRNAs) in Neurological Disorders
Precise interpretation of the transcriptome sequences in the several species showed that the major part of genome has been transcribed; however, just a few amounts of the transcription sequences have open-reading frames which are conversed during the evolution. So, it is unlikely that many of the transcribed sequences code the proteins. Among the all human non-coding transcripts, at least 10000...
متن کاملSNHG6 203 and SNHG6 201 Transcripts Can be Used as Contributory Factors for a Well-Timed Prognosis and Diagnosis of Colorectal Cancer
Background:Long non-coding RNAs, as a big part of non-coding RNAs, are considered functionally more than past. These transcripts could be involved in carcinogenesis. SNHG6, as a long non-coding RNA, has been reported to be expressed more in colorectal cancer tissues than non-cancerous ones. Colorectal cancer as a malignancy needs fast prognostic and diagnostic methods for well...
متن کامل